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Abstract 

We derive central limit theorems for the Wasserstein distance between the empir¬ 
ical distributions of Gaussian samples. The cases are distinguished whether the 
underlying laws are the same or different. Results are based on the (quadratic) 
Frechet differentiability of the Wasserstein distance in the gaussian case. Ex¬ 
tensions to elliptically symmetric distributions are discussed as well as several 
applications such as bootstrap and statistical testing. 
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1. Introduction 

Let P, Q be in Ati(M'^), the probability measures on Gonsider tt^ : 
X ^ R,x = {xi,X2) 1-^ Xi, i = 1,2, the projections on the first or the 
second d-dimensional vector, and define 

n(P, Q) = {^ g X : fj, o = P, /x o = Q} 
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as the set of probability measures on x with marginals P and Q. Then 
for p > 1 we define the p- Wasserstein distance as 

(1) yyp(P,Q) := inf , f / \\x-y\\^ tJ‘idx,dy)] 

/ien(p,Q) \7R2d / 

There is a variety of interpretations and equivalent definitions of Wp, for example 
as a mass transport problem; we refer the reader for extensive overviews to 
Villani m and Rachev and Riischendorf m- 

In this paper we are concerned with the statistical task of estimating >Vp(P, Q) 
from given data Xi,..., X„ ~ P i.i.d. (and possibly also from data Yi,..., ^ 

Q i.i.d.) and with the investigation of certain characteristics of this estimate 
which are relevant for inferential purposes. Replacing P by the empirical mea¬ 
sure Pji associated with Xi,...,X„ yields the empirical Wasserstein distance 
Wp,n := Wp(Pn,Q) which provides a natural estimate of >Vp(P,Q) for a given 
Q. Similarly, define >Vp,„,m := Wp(Pn,Qm) in the two sample case. For inferen¬ 
tial purposes (e.g. testing or confidence intervals for >Vp(P, Q)) it is of particular 
relevance to investigate the (asymptotic) distribution of the empirical Wasser¬ 
stein distance. 

This is meanwhile well understood for measures P, Q on the real line K 
as in this case an explicit representation of the Wasserstein distance (and its 
empirical counterpart) exists (see e.g. p2l[28l 1^l3^l38ll44] 1 

(2) WP{F,Q)= [ \F-\t) - G-\t)\P dt. 

Here, F{x) = P((—oo,a;]) and G{x) = Q((—oo,a;]) for x S K denote the c.d.f.s 
of P and Q, respectively, and F~^ and G~^ its inverse quantile functions. Now, 
Wp^n is defined as in © with F ^ replaced by the empirical quantile function 
F~^, and the representation ([^ can be used to derive limit theorems based on 
the underlying quantile process \/n{F~"^ — F~^). These results require a scaling 
rate (a„)„gN such that the laws 

(3) a„ (^W^(P„,Q) - >V^(P,Q)-k , as n ^ oo 

(for some centering sequence {bn)neN) converge weakly to a (non-degenerate) 
limit distribution. Depending on whether F = G as well as on the tail behavior 
of the distributions F and G we find ourselves in different asymptotic regimes. 
Roughly speaking, when F = G (i.e. P = Q, >Vp(P, Q) = 0), a„ = n is the 
proper scaling rate, i.e. the limit is of second order and given by a weighted 
sum of laws (see e.g. [mn]). In general, depends on the tail behavior 
of F. In contrast, when F ^ G, i.e. >V^(P, Q) > 0 for a„ = ^/n, = 0 the 

limit is of first order and •\/n(yVp(Pn, Q) — >Vp(P, Q)) is asymptotically normal 
(see [23l[34]) under appropriate tail conditions. Various applications of these 
and related distributional results, e.g. for trimmed versions of the Wasserstein 
distance, include the comparison of distributions and goodness of fit testing 
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([SlITUlllllMl)) template registration (Section 4 in [HIT]), bioequivalence testing 
atmospheric research ([IH]), or large scale microscopy imaging ([SH]). 

In contrast to the real line (d = 1), up to now limiting results as in (|^ 
remain elusive for d > 2. However, see [2] and m for almost sure limit 

results and m for moment bounds on Already the planar case d = 2 is 

remarkably challenging ( 0 )- One difficulty is that no simple characterization as 
in ([^ via the (empirical) c.d.f’s exists anymore. In particular, the couplings for 
which the infimum in 0 is attained are much more involved, see e.g. [311138] . 
We will come back to this in the context of our subsequent results later on. 

In this article we aim to shed some light on the case d > 2 by further 
restricting the possible measures P, Q to the Gaussians (and more generally to 
elliptical distributions). Here, a well known explicit representation of >Vp(P, Q) 
can be used (see e.g. [H], [36], [25]) which allows one to obtain explicit limit 
theorems again. The Gaussian case is of particular interest as it provides, as 
shown in |25| , a universal lower bound for any pair (P, Q) having the same 
moments (expectation and covariance) as the Gaussian law, see also |5|. 

Limit laws for the Gaussian Wasserstein distance. More specifically, from now 
on let the laws P, Q S A4i(K‘^) be in the class of d-variate normals, i.e. 

(4) P ~ iV(/r, E) and Q ^ N{v, S) for some fJ.,v € S, S S S'_|_(IR‘^), 

the symmetric, positive definite, d-dimensional matrices. From now on we will 
also restrict to p = 2. In this case the Wasserstein distance between 7V(/i, E) 
and N{iy,E) is computed as (see [T9||27||36|) 

(5) gw := W|(P, Q) = ll/r - i/f -h tr(E) -k tr(S) - 2 tr 

Here, tr refers to the trace of a matrix and its square root is defined in the usual 
spectral way. The norm || • || is the Euclidean norm with corresponding scalar 
product denoted by (•,•). Now, if we replace P with the empirical measure P„ 
and read /r and E as a functional of P, we obtain the empirical Wasserstein 
estimator GWn restricted to the d-dimensional Gaussian measures as 

gWn=gWn{Xi,...,XnM 

(6) : = W|(iV(/i„,E„),iV(^.,S)) 

= Wfi - -I- tr(S) -I- tr(5) - 2tr 

Similar to the case of the general empirical Wasserstein distance for d = 1 
we find in the following that the asymptotic behavior differs whether P = Q, 
i.e. 11 = 1 ' and E = 5 or P ^ Q. Let us start with the latter case which turns out 
to be simpler. We show in Theorem |2.1[ whenever P ^ Q, i.e. p, ^ or E S 
a limit theorem as in 0 holds with a„ = and = 0, i.e. as n —)■ oo, 

(7) V^(wi (iV(A„,S„),Q) ->V|(P,Q)) ^iV(0,u2). 


^1/22S1/2 


1/2 


E1/22E1/2' 


-, 1/2 
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Here the asymptotic variance can be explicitly computed as 


d 

(8) ^ - fi) + 2tr(E2 + SS) - 4^ 

k=l 


where {(k^, r]^) : A: = 1,..., d} denotes the eigendecomposition of the symmetric 
matrix into orthonormal eigenpairs (consisting of eigenvalues and 

eigenvectors). Here and in the following we denote by * the transpose of a vector 
(or matrix). We will also treat the two sample case (Theorem |2.2[ ), where Q is 
additionally estimated under the Gaussian restriction by a second independent 
sample. In this case, the eigendecomposition of E itself given by {{Xi,pi) : 
I = 1,... ,d} additionally occurs in the limiting variance, which disappears for 
distinct eigenvalues of E, however. 


Our proof relies on the Frechet differentiability of the Wasserstein-distance 
in the Gaussian case (see Theorem 2.4) together with a Delta method (Theo¬ 
rem 4.1). The formula for the Gaussian Wasserstein distance ([^ can be seen 
as a (non-linear) functional of symmetric operators. The proof of its Frechet 
differentiability is based on the second order perturbation of a general compact 
Hermitian operator (see Corollary |B.2[ ); this result is of interest on its own. In a 
similar way we treat the case P = Q. Here, the first derivative vanishes and we 
show that the asymptotic distribution is determined by the Frechet derivative 
of second order. This gives for a„ = n and &„ = 0 a non-degenerate limit which 
can be characterized as a quadratic functional of a Gaussian r.v., see Theorem 
2.3 Note that all scaling rates are independent of the dimension d {d enters 


in the constants, though) and coincide with those for dimension d = 1 for the 
general empirical Wasserstein distance based on ([^. 


Comparison to known results in d > 2. Although distributional results of Wp,n 
are not known for d > 2 it is illustrative to discuss our limit results in the light 
of some known results on bounds of the moments of Wp^n and a.s. limits. We 
will restrict to p = 2, since our results apply only to that case. 

A particularly well understood case for P = Q is the uniform distribution 
on the d-dimensional hypercube, see 0, m and m- In [T^ it is shown that 
c„W|(P„,P) —>■ A almost surely for a certain A S (0, oo) and Cn = when 
d > 3. To the best of our knowledge a finer distributional asymptotics in the 
sense of 


(9) a„(W|(P„,P) + 6„)^Z 


for bn = —Xjcn with non-degenerated r.v. Z is not known. If it existed, it would 
require c„ = o(a„). Our Theorem 2.3 affirms the existence of the limit in 


for the Gaussian Wasserstein estimator with a„ = n and = 0. The fact that 
On = n grows faster than c„ = for d > 3 was expected by the previous 
argument. Similar argumentation holds for the result in [5] where upper and 
lower almost sure bounds Ai < A 2 € (0,oo) are given for c„ = n/(logn)^. To 
subsume the comparison to the almost sure results: our rate a„ = n is in the 
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range of possible rates, i.e. l/a„ = o(l/c„), which may be expected for non¬ 
trivial distributional limit results. Recall, however, that we are not proving a 
limit as in Q, but in the sense of Q, where W|(P„,Q) is replaced by 

It is also interesting to compare our rate for P = Q to moment bounds. In 
PT] upper bounds are given for £'[H’|(P„,P)] when P is a measure with finite 
moments of any order (recall that Gaussian distributions have moments of any 
order). They obtain that dnElW^i^, P)] < C for a constant C < ooiidn = 
for d > 5, dn = for d < 3 and d„ = n^/^(log(l -I- n))~^ for d = 4. All those 
results are consistent with our result in the sense that limsup„_,,oc dn/an < oo. 

So far we have only discussed the case P = Q. The literature on the case 
P ^ Q is much scarcer. For the case d = 1, [33] obtain in situations comparable 
to ours also asymptotical normality for a„ = and = 0. This is the same 
scaling rate as the one observed in our case. For higher dimensions we do not 
know of any explicit results. Theorem 3.9 in |15j gives a result which is similar 
to 0 except that their setting deals with an incomplete transport problem. 
Their rate for d > 1 is also a„ = and they obtain that the left hand side of 
the corresponding version of Q is bounded in probability. In particular, they 
do not state an explicit limit law as we can give it in our special situation. 

To the best of our knowledge, other results are not yet available for the case 
P ^ Q in higher dimensions. 

Elliptical distributions. It is possible to generalizate the result in Q beyond 
the class of Gaussian distributions. As [25] showed in Theorems 2.1 and 2.4, 
formula 0 holds for more general classes of distributions (with appropriate 
modifications); i.e. elliptically symmetric probability measures. We comment 
on this generalization in Remark |2.4] 

Statistical Applications: Inference and Bootstrap. A bootstrap limiting result 
follows immediately from our proof as well. For the case of P and Q being differ¬ 
ent the first order term in the Frechet expansion determines the asymptotics and 
an n out of n bootstrap is valid (see e.g. [45]). Other resampling schemes, such 
as a parametric bootstrap can be applied as well (see e.g. [H]). When P = Q 
the second order Frechet derivative matters and one has to resample fewer than 
n observations, i.e. o{n) (see e.g. 0) to obtain bootstrap consistency. As the 
limiting laws are rather complicated, bootstrap seems to be a reasonable option 
for practical purposes, e.g. confidence intervals for the Wasserstein distance in 
the Gaussian case can be obtained from this m l4T] . We provide more de¬ 
tails on bootstrapping the Gaussian Wasserstein distance and an application to 
structure determination of proteins in Section [23] 

The paper is organized as follows. In Section [2.1[ we present the main results 
on the asymptotic distribution in the one and two sample case and in Section [2.2| 
we provide the main results on Frechet differentiability of the underlying func¬ 
tional. Next, we give two applications, one theoretical application regarding the 
bootstrap in Section [2.3] and a practical one regarding a data example for the 
positions of amino acids in a protein in Section [^ In Section [^ we present the 
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proofs of the main theorems. Finally, the appendix comprises some required 
facts and technical results on functional differentiation. 

2. Main Results 

2.1. Limit laws for the Gaussian Wasserstein distance 

This section contains the three main results on convergence of the empirical 
Wasserstein distance estimator QWn defined in §. The first two theorems 
present the case where P 7 ^ Q and the last one states the result for P = Q. 

Suppose now that P 7 ^ Q are both Gaussian distributions as in Q. We 
denote their Wasserstein distance by QW := ^yV(P, Q). Suppose we have inde¬ 
pendent samples from these different distributions. Then we obtain the following 
result. 

Theorem 2.1 (Asymptotics for the empirical Wasserstein distance in the one 
sample case, P 7 ^ Q). Let P 7 ^ Q m he Gaussian, P ^ ^ 

N^v, S) with E and S having full rank. Let Xi ,..., A„ A^(/i, E) and con¬ 
sider the Gaussian Wasserstein estimator QWn from (§. Then as n ^ 00 , 


(10) (gWn - ^ A^(0, 

where 

= 4(1/ — ^)*E(i^ — ^) -I- 2 tr (E^) -|- 2 tr (ES) 

- 4 ^ 

k=l 


Here, {(k^, rjf) : fc = 1,..., d} denotes the eigendecomposition of the symmetric 
matrix into orthonormal eigenpairs (consisting of eigenvalues and 

eigenvectors). 

In many practical applications we may not have direct access to the param¬ 
eters of the distribution Q = N(v, S) and we merely have a sample from that 
distribution. The generalization of the estimator from (|^ for the two sample 
case is given as 


(12) ^„,„(Ai,...,A„,ri,...,r^) :=Wl (iV(A„,E„),iV(^4„,S^)) 


The following result is the two sample analogue to Theorem 2.1 


Theorem 2.2 (Asymptotics for the empirical Wasserstein distance in two sam¬ 
ple case, P 7 ^ Q). Let P 7 ^ Q m A4i(M'^) be Gaussian, P ~ A^(^, E), Q ~ N{v, 2) 
with E and 2 having full rank. Let n S N and m = m{n) S N such that 
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n/m{n) —>■ a/(l —a) as n ^ oo for a certain a € (0,1). Consider the i.i.d. sam¬ 
ples {Xi,..., Xn) and (Yi,..., Ym) with joint law P and Q, respectively. Then 
as n —>■ oo, 

(13) IJI^ (gWn,m - QW) ^ 7V(0, tZ72), 

\ m n \ / 

where 


(14) = 4(i/ — /i)*((l — a)S + aX){v — /x) + 2tr((l — a)E^ + 

d 

+ 2otr(ES) - 4^ 4^"g^((l - a)E + 

k^l 

d d d 

- 2(1 - a) XI X X <fmphk<llpjp]qk- 

k,l—l i —1 j^i 


Here, {{Ki,qi) : I = 1,... ,d\ denotes the eigendecomposition of the symmetric 
matrix E^'^^SE^/^ into orthonormal eigenpairs (consisting of eigenvalues and 
eigenvectors) and {{Xi,pi) : I = 1,... ,d} is the eigendecomposition ofY. 


Remark 2.1 (Distinct eigenvalues of E). We note that the last term of (141 
disappears if all eigenvalues A/, Z = 1,..., d of E are distinct. 

Remark 2.2 (Commutative case E5 = 5E). In this case, we can choose the 
eigenbasis of E and S to be the same, which is thus also an eigenbasis of 

E 1 / 22 E 1/2 

implying that plqi = Sn. Denoting by AJ. the eigenvalues of .n we 
have that A^A). = Kk- Using this, we see that the last term of (14) disappears. 
For a = 1/2 the last term of ( [TT| ) and the second and third last term of (14) sim¬ 
plify to —4tr((E^/^5E^/^)^/^E) and —4tr((E^/^SE^/^)^/^(E-|-S)), respectively. 
Thus, in this case for our two previous theorems the following simplifications 
apply 


= 4(i/-/r)*E(!/-/i)-t2tr(E(Ei/2-Si/2)2^ ^ 

= 4(j^-/r)*(E-HS)(:y-^)-H2tr(E(Ei/2_si/2)2^ 

+ 2tr (5(51/2 _ £1/2)2^ ^ 


The result is restricted to the case of covariance matrices of full rank. We 
comment on that restriction in Remark 14.21 

Note in the previous result that the variance 137 is zero if P = Q, i.e. jj. = v 
and E = 5, see also Remark |2.6[ In this case a second order expansion provides 
a valid limit law. Namely, the following theorem holds. 


Theorem 2.3 (Asymptotics for the empirical Wasserstein distance, P = Q). 
Let P S AIi(K‘^) he Gaussian: P ^ N{p,'E,) with E having full rank. Consider 
the i.i.d. samples (X^^ • ■ • > and {x[^\ ..., X^'^) with joint law P and the 
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Gaussian Wasserstein estimators QWn from ([^ and QW„^n 
as n —> oo, 


from (12). 


Then 


(15) ngWn^Zi, 


and 

(16) nQyVn,n ^ Zi , 


where Zi and are random variables, characterized in (40). 


Remark 2.3 (The limiting distribution for P = Q). An explicit description of 
the limit in (40) is difficult in general, a simplification can be given in the one 
dimensional case, see (32). 

Analogously to Theorem |2.2| it is also possible to obtain the asymptotics in 
the case that the sample sizes for P and Q are not the same. In the regime 
n/m —)■ a/(l — a) for a certain a € (0,1) as n —>■ oo the convergence holds for 
{{nm)/{n + m)) GWn,m- 


Remark 2.4 (Generalization to elliptically symmetric distributions). Gelbrich 
pS] showed that formula © holds for any two elements P, Q € AIi(K'^) which 
are translations of distributions whose covariance matrices are related in a cer¬ 
tain way. He also showed that this condition is fulfilled as long as they are in 
the same class of elliptically symmetric distributions. The class of Gaussian 
distributions is such a class. 

More generally, denote by 5”° (M'^) the non-negative definite, symmetric ma¬ 
trices and by rk^ the rank of any A G ^((.(K^^). Let / : [0,oo) —>• [0, oo) be a 
measurable function that is not almost everywhere zero and that satisfies 



ltl‘f(t^)dt < oo. 


I = d— l,d,d+ 1, 


and set ca = f f{{x,Ax))dx. Then, one can consider classes of the form 

(18) M{ (K'^) := {P G Mi{lmA) ■ A G S'^(K'^) with rk^ = d,P has density 
fA,v{x) = CAf{{x - v,A{x - v))),x G G 


see Theorem 2.4 of [5^ (there stated also for matrices A that do not have full 
rank, which is not considered here). 

As can be seen from 0 and ( |18[ ) by setting / = l[o,i] another prominent 
example for elliptically symmetric distributions is that of uniformly distributed 
probability measures on ellipsoids, i.e. on sets of the form Ua,v ■= {cc G : 


{x — v,A[x — v)) < 1} for A G S'lJ.) 


^) and V G Furthermore, we obtain 


the multivariate t-distributions (with n > 0 degrees of freedom) by setting 
f(t) = (1 + These play a particular role for copulae models, see [TO] . 


As the largest part of the proofs of Theorems 2.1 2.2| and |2.3| relies only 
on the specific form of formula ([^, the results of these theorems immediately 
transfer to other classes of elliptically symmetric distributions. What still needs 









to be verified in the various cases is that a central limit theorem holds for 
the empirical mean and for the covariance matrices, see our Lemma 4.2 in the 
Gaussian case. For example, this requires v >2 for the class of multivariate t- 
distributions to guarantee the existence of second moments. The specific form of 
the analogous limits in Theorems |2.1[[2?^ and |2.3| will depend on the specific form 
of the limit in the appropriate central limit theorem and has to be computed 
from case to case. 


2.2. Frechet differentiability of the Gaussian Wasserstein distance 

The concept of differentiation on Banach spaces will be an important tool 
for the proof of the results in the previous section. We give a comprehensive 
reminder of some classical results for Frechet derivatives in Section]^ Moreover 
some more advanced results about a Taylor expansion of a functional of an 
operator may be found in Section [B| 

Now we consider the 2-Wasserstein distance of Gaussian distributions as a 
functional of their means and covariance matrices (see ©)• In the following 
we show its Frechet differentiability and explicitly derive its Frechet derivative. 
To this end, consider A, B G S+{]SA‘) C ~ (symmetric, positive 

definite matrices). We use the eigenvalue decomposition for A and Afl'^BAAl'^ 
of the form 


(19) 


d 

2=1 


i=l 


where Xi,Ki > 0, PiPj = SijPi, QiQj = dijQi,l < i < j < d. Our decomposition 
implies that all projections P^, Qj are onto one dimensional spaces such that 
we can write Qi = qiql and Pi = pip\ for vectors qi and pi in i = 1 ,..., d. 

Lemma 2.4 (Differentiability of the 2-Wasserstein distance W| of Gaussian 
distributions). Let 4) : x S+(W^ff —>■ K fee given by 

(20) (ti,n,A,B)^ ||p-z/f+tr(A)+tr(P)-2 tr((yli/2^Ai/2)i/2j . 

This mapping is Frechet differentiable and its derivative at A, B) G x 
5'+(K‘^)^ is a mapping in x given by 

9^ G, G')] = 2{p — v) ■ {g — g') + iv G + tr G' 

- E E K"9^P^GP^ql - 

( 21 ) 1=1 i=l 1=1 

d d 

-E'^E E (AzA™)-'/'gfP,GP^® 

1 — 1 i,m—l 


for all g, g' G G, G' G and with k, A, P,Q as in (191. 
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Remark 2.5. Note that the last result is stated in finite dimensional spaces. Ob¬ 
viously in this case Frechet differentiability coincides with usual differentiability. 
Nonetheless, we prefer to use the abstract setup for simpler notation, obvious 
extensions to the infinite-dimensional case and because it is consistent with the 
cited references. 


Recall that $ is a symmetric function in the entries /r and v and likewise in 
A and B. If we switch the notation in the previous theorem and then consider 
g' and G' equal to zero we obtain as an immediate consequence. 


Corollary 2.5. Let . ]^d X S'+(IR'^) —)• K &e given by $ from as a 

function of /i and A for fixed iz S and B € S+(R‘^). Then ^G,b) (g Frechet 
differentiable and its derivative at any point {g, A) € x is an element 

„ _ j -Tidxd — ' 


of L{Rf 


given by 


d 

(22) G)] =2(m - • g + trG - ^ , 

;=i 


for all g G G G and {(K/,r/),Z = l,...,c?} the eigendecomposition of 

51/2^51/2 

as in ( [I^ . 

The previous theorem also allows a simpler representation of the derivative 
if we restrict to certain cases. 

Remark 2.6 (Commutative case AB = BA). Here, we can choose the eigenbasis 
of A and B to be the same, which is thus also an eigenbasis of A^I'^BAAI'^ 
implying that Pigi = Suqi: If A), are the eigenvalues of B this implies that 
AfcAJ. = Kk and we obtain in Proposition 


2.4 


^(a,b)<(>^"^[(G,G')] = trG + trG' 


qi 


(23) 


= ^,; g « (1 - (1 - 




Z=1 

d 


. A/ 


'a; 


1=1 '' “ ^ 1=1 

This implies in particular that the derivative equals zero A = B. 

At the end of this section we state a result on the second order derivative 
of $. 

^2 _ 


Theorem 2.6. Let $ 


X S. 


be as in Proposition 2.4 
12 


The 


mapping is twice Frechet differentiable. Its second derivative at 

a point {fj,,i/,A,B) G x 5'+(IR‘^)^ is a symmetric bilinear mapping from 
U 2 d ^ ]g 2 (dxd) ^ ]j 2 d ^ ]^ 2 (dxd) which is defined by its quadratic form 

Dl,.,A,3)^(9,9, G, G'), {g,g', G, G')], g,g' G K'", G, G' G 

Remark 2.7. It would be possible to use the calculations from Corollary |B.2| in 
Theorem 2.3 to obtain an explicit formula for the second derivative for d > 2. 
However, this calculation is very tedious even for d = 2 and we will not carry it 
out here. 
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2.3. Bootstrap 

Applications of Theorems |2. 1 1 and |2 .2 1 such as the construction of confidence 
sets require one to estimate the variances v and w of the limiting distribution. 
But for the construction of confidence sets those quantities need to be estimated. 
Of course, these can be estimated from the data by their empirical counterparts 
as well. Another option is to bootstrap the limiting distribution, which becomes 
particularly useful for an application of Theorem |2.3| as the limiting distribu¬ 
tion has a complicated form. In fact, due to the differentiability results of the 


last section (Lemma 2.4 and Theorem 2.6 1 we can rigorously establish such a 


bootstrap. We illustrate the bootstrap approximation for the one sample case, 
the two sample case is analogous. For m < n we denote by ATJ",... an in¬ 
dependent resampling (with replacement) of the sample Xi,..., X^ and define 
as in ^ using that resampling. 

As in the beginning of Section|2T]a distinction for the cases P ^ Q and P = Q 
is required. The former allows an n-out-of-n bootstrap, the latter requires an 
m-out-of-n bootstrap, s.t. m = o(n). 

Proposition 2.7 in out of n bootstrap). Suppose P ^ Q. Then 


(24) - gWn) =» Af(0, v^) 

conditionally given Xi,X 2 ,... in probability. 

Here, weak convergence conditionally given Xi, X 2 ,... in probability means 
the following: Denote by p a metric corresponding to the topology of weak 
convergence and by £(■) the law of a random quantity. Then (24) means that 
p{C{QyVn — gyVn)), N{0, v^)) as a function of Xi,..., Xn converges to zero in 
probability. 

Proposition |2.7| follows immediately from Theorem 23.5 in |45j combined 
with the differentiability of Lemma |2.4| and the strong consistency of the boot¬ 
strap result for the sample mean and the sample covariance matrix of Gaussian 
distributions, respectively. Note that this follows from the bootstrap consis¬ 
tency of the multivariate empirical process (Theorem 23.7 in [1^) together with 
Hadamard differentiability of £{F). In our case this also follows immediately in 
an elementary way from the fact that S„ is independent of pn and from the fact 
that its distribution does not depend on p. Thus, it follows from the bootstrap 
consistency for the multivariate i.i.d. average ^ ^i^l- 

Note that since the left hand side of (24) only depends on the sample (and 
some further randomness) the result serves to estimate the right hand side and 
so in particular 

For P = Q we obtain the m out of n bootstrap. 


Proposition 2.8 [m out of n bootstrap). Let m = m(n) such that m{n)/n 
as n —>■ 00 . Suppose further that P = Q. Then 


0 


(25) m (ewl - mn) ^ 


II 






conditionally given Xi,X 2 ,... in probability, where Zi is the distribution from 
Theorem \2.tA 

This follows along the lines of the proof of Theorem 5.1 in [23] using the 
second order differentiability of Theorem |2.6| together with the m out of n 
bootstrap consistency result for the sample mean and sample covariance matrix 
of Gaussian distributions. 


3. Applications 


Theorems 2.1|2.3 can be used (in combination with the bootstrap results in 
Section [TS]) fo r several purposes: e.g. testing the null hypotheses H : QW — 0 
(Theorem 2.3 for the two sample case) or neighborhood hypotheses of the form 
H : QW > 6 vs. K : QW < J in order to validate the closeness of the multivariate 
normal distributions in Wasserstein distance. Here 5 > 0 is a threshold to be 
fixed in advance, see e.g. [3S] for a related test and further references on these 
types of testing problems. The test amounts to rejecting whenever rn(QW—S) > 
Ua (see Theorem 2.1 in the one sample case), where Ua denotes the a-quantile 


of a standard normal random variable. 

Another immediate consequence are (bootstrap) confidence intervals for QW; 
which require the asymptotics for P ^ Q (Theorem 2.1 and Theorem 2.2), see 
im for a general exposition and m for the Wasserstein distance on the real 
line. In the following we exemplarily illustrate our methodology for the one 
sample test on a real data application. 


3.1. Positions of amino acids in a protein 

In order to understand the biological function of proteins it is important 
to know both their three dimensional structure as well as their conformational 
dynamics. X-ray crystallography and NMR spectroscopy can help elucidate 
high-resolution information about biomolecular structures but conformational 
dynamics is more elusive, see e.g. |29j . Small-amplitude dynamics is thought 
to be reflected by crystallographic B factors, whereas NMR structures are often 
interpreted as native state ensembles. However, both interpretations should 
be taken with some caution. Therefore, it is of interest to investigate whether 
the crystallographic view on conformational dynamics provided by B factors 
agrees with the ensemble view provided by NMR. To this end, we will use the 
Wasserstein based test to quantify to what amount the local flexibility measured 
by X-ray crystallography agrees with the structural variability seen by NMR. 

Our analysis is based on the crystallographic model of proteins, see Sec¬ 
tion 2.2 of [33]. This model postulates that each amino acid in the protein is 
a point that has a position which follows a Gaussian distribution with mean 
/X S and covariance matrix /3^1, where 1 is the identity matrix in It is 
customary to assume the positions of different amino acids as independent. In 
this model, the quantity fP is called the R-factor and is related to the Debye- 
Waller factor used in crystallography. 


12 











Figure 1 : Test for Hq] regions of rejection in blue. 


We focus on a particular amino acid and want to compare the “true” dis¬ 
tribution P proposed by the crystallographic model to the samples obtained 
from NMR spectroscopy. Arguably, the Wasserstein distance is particularly 
well-suited for this scenario as it accounts for a measurement of displacement. 

In order to obtain a test for the hypothesis 


(26) Hq : the samples come from the true (Gaussian) distribution 


we apply Theorem |2.3| In the present setting we can further simplify and obtain 
an explicit description of the limit law Z^- If we assume that P = A^(p,, cr^l) as 
reference distribution in then we obtain for (151: 

=a^{2X + QX' + lX") 

for independent x§-random variables X and X' and x|-random variable X” 
with three and six degrees of freedom, respectively. We denote the a-quantile 
of the variable Zi/a'^ by ^q, a S (0,1). Then a test for (26) at level a is given 
by: 


Reject if ^S>V(P„,P) > gi_c 


We analyse the protein ubiquitin (consisting of 76 amino acids, PDB ref¬ 
erence lubq) using the crystallography data (implying P) and the NMR data 
(with sample size n = 10) from the Protein Database (RCSB PDB), see [5]. For 
each of those amino acids we test the hypothesis Hq in ( p^ . At level a = 0.05 
for 8 of the 76 amino acids we reject Hq (see Figure]^ and at level a = 0.01 for 
4 of the 76 amino acids we reject Hq. Interestingly, all the rejection appears in 
the loops of the protein which suggests that NMR and crystallographic struc¬ 
ture determination does not align well at these locations. At other locations of 
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the ubiquitin protein we did not find evidence for deviation from the normal 
distribution as predicted by the model. 

We stress that our analysis does not provide evidence for the positions not 
being jointly multivariate normal. This is an issue which would require larger 
samples but could be investigated with our methodology as well. 


4. Proof of Theorems |2.1| | |2.2| and 2.3 


In order to prove Theorem |2.2| and Theorem |2.3| we will apply the Delta 
method. To prepare for this, in Section 4.1 we provide the proofs for the 


Frechet differentiation of the Gaussian Wasserstein estimator from Section 12.21 
Section |4.2| collects the required standard results on the convergence of the 
empirical mean and covariance matrix of Gaussian distributions. Combining 
these results with the differentiation results we complete the proof of Theorems 


|2.1| and 2.2 with the exception of determining the variance of the limit which will 
be provided in SectionThe proof of Theorem |2.3| follows similar arguments 
and is also completed in Section [T^ 


4.I. Frechet differentiability 

First we sketch an application of the results from Section]^ Let 

be the set of (continuous) linear maps from to R"^ and let G = C be the 
complex numbers. Clearly, D is a Banach algebra with respect to the classical 
operator norm ||A|| = ||A|jD = sup|| 3 ,||^i ||Ax||,T S D. Consider the subspace 
S'+(R'^) C D of symmetric, positive definite matrices (which means that all 
eigenvalues are positive). Then any A G S'+(R‘^) can be written in the form 


d 

(27) T = 

i=l 


Xi G (0, oo), PiPj = SijPi ioi 1 < i < j < d. Note that this definition is different 
from |26j in that repeated eigenvalues are listed according to their multiplicity. 
Suppose i/) : D —>■ C is analytic. Then it is possible to define 'ip(A') for A' in a 


neighborhood of A e 


as in (B.2) and apply Corollary 


B.2 


We will do 


that in the two proofs to come below. 

The first proof deals with the derivative of the Gaussian Wasserstein distance 
functional. 


Proof of Proposition \2.4\ The mapping : R^^^ —>• R, (/r, v) >->■ ||/r — has 
Frechet-derivative 


(28) g'). 
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Next, we treat the second part of the mapping by considering 
K given by 

{A,B) ^ (j}^^\A,B) = tr(A) +tr(B) - 2tr . 

Here, we first consider ijj : D ^ C, z ^ where D is some open bounded 
subset of C no t containing elements of the ray M_ = (— 00 , 0]. An application of 
Corollary B.2 yields its Frechet derivative at A €S+{ 


d d ,1/2 ,1/2 

(29) D^#[2l']=^;^P./l'P.+ ~ ' 

Za^ i,k=l 

Ai/Afc 




m'Pk 


for any A! G Using this we can deduce the Frechet derivative of 

1 /2 

^ -^V, (A,B)^A + B-2 BA^/'^'^ . 

First note that by linearity. Lemmas |A.1HA.2| and |A.3[ 
D^a,bMG,G')] = G + G' 


— 2DlpJ^l/2BA^/2 


D^a[G]BA^G + a}Gg'A}G + 


G, G' € With (19) and (29) we can write the derivative more explicitly 

D(A,B)<f’l(G,G')] = G + G' 

A 

1 


- 2 E 


Ok 

/ = 1 


1/2 


/ d 

I 'y ^ —jy^QzFjGPi-Bv^Q; + Qiy/AG'y/AQi 

\i=i 2Aj 




E 2 A 


- 2 E 


d 1/2 1/2 / d 

^ 


i,fc=l 

Kl^Kk 


Ki - Kk 


y ^ — Yj^QiPiGPiB'/AQk + Qi'/AG' y/AQk 

i=i 2Aj 


+ E! — Yp2Qi''''/^BPjGPjQk 


-^EEt. e 


d .1/2 xl/2 

A,- Am 


/ = 1 i,m—l 


—^ 9 \ 

i -1 

{QlP^GPmBVAQl + QiVABP,GP^Q/j 


d ^1/2 ^1/2 d .1/2 .1/2 

_ 2 

m — f^k . . Aj — Am 

/,«=! 2 ,m=l 


Kl^Kk 


\i^X„ 


(^QiP.GP^BVAQk + QiVABP.GPmQk) ■ 
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Using the fact that 

Xy^P^B^AQi = P.VAbVAQi = kiP^Qu 
X]'^Qi\fABPj = QiVAbVAP, = KiQiPj, 

allows us to simplify the above expression to 


D^a,bMG,G')] = G + G' 


d d I 

V ^QiP^GP^Qi + -Qiy/AG'y/AQi 


/=1 2=1 


j=i 1 


d 1/2 1/2 d 


Kl — Kk 

Lk=l ‘ i=l 


- 2 E 

i,k^: 


(E ^QiP^GP.Qk + Qiy/AG'y/AQk 

d 

E ^QiPjGP.Qk) 


/=i 


d d .1/2 ,1/2 / , , N 

-E^E E -^y^J^(^QlP^GPmQl + ^QlP^GP^Ql 

- - - ^2 '^TTL \ ^TP ^ ' 


Z = 1 2,771=1 


A 


d ^ 1/2 1/2 d 1/2 , 1/2 

-2 ’■ ~ ^ ~ 

Kl — Ki^ 


i,fc=i i,^i ^A< 

Kl^Kk Ai/Am 


■{-%QlP^GPmQk 

' Am 

+ ^ 1/2 QlPiGPmQk^ ■ 


Note that this can be simplified further as several of the terms are now of the 
same form. However, in the end we will take the trace of this object which will 
lead to further reductions. We will perform these steps at the same time. The 
trace is a linear mapping so that with Lemma El we obtain 


DiA,B) (tr o0) [(G, G')] = tr (Z7 <^(a,b) (G, G')) ■ 


Now use that tr( 2 l) = J2i=i Qi^Qi for any operator A (see Lemma C.2 1 with the 
eigenbasis {qi : I = 1,..., d} where Qi = qiq\. Then all of the terms containing 
Qi and Qk for I ^ k vanish leaving us with 


(30) D^AM^^^liG,G')] = U(tro</,)(^_ 3 ) [(G,G')] 
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d d 

, 1/2 


= tr G + tr ^ ^ X-^qfP^GP^qi - ^ t^^^^^qjVAG'VA. 


m 


Z =1 


2 = 1 


Z =1 


d d 

, 1/2 


jy {X^Xmy^^\^P^GP^ql. 


/=1 


2,m=l 

AiT^Arn 


Adding this to (28) ends the proof due to Lemma A.3 


□ 


In a next step we give the proof for the result on the second order differen¬ 
tiability. 


Proof of Theorem \2.(^ We need to check that the first derivative obtained 
in Proposition [2^ is Frechet differentiable. Formally, by chain rule and linearity 
of the trace, 


9'. G, G'), (ff, g', G, G')] = 2(5 - 5 ', 5 - 5 ') 


(31) 


(dI^,b)^[{G,G'),{G,G')] 


where = (A^/^SA^/^)^/^ = if ftp {A) B if [A)) with if{G) = This 

formal derivation is valid as long as the last expression ^jl['[(G, G'), (G, G')] 
exists. 

First, let us note that if : 5'+(M'^) —)■ S'+(K‘^) is twice Frechet differentiable 
by Corollary |B.2[ Then the existence can be obtained from the chain rule in 
Lemma |A.2[ more precisely: 

DfA,B)mG, G'), (G, G')] = [G, G] 


+ D 


A1/2BA1/2 


if 


D 


l^^s){yA)Bif{AmG,G'),{G,G')] 


where G = I?(yi,B)('0(A)i3'!/'(A))[(G, G')] is used for abbreviation. The objects 
in the first line are all well-defined since if is t wice F rechet differentiable and 
if{A)Bif{A) = A^Gqa^G ^ Note that by Lemma A.l the objects in the second 
line are also well-defined. 

Dl^^B^{if{A)Bif{AmG,G'),{G,G')\ 

= D\if[G, G]Bif{A) -t 0 -b if{A)BD\if[G, G] 

+ 2DAif[G]G'if{A) + 2DAif[G]BDAif[G] + 2if{A)G'DAif[G]. 


This means that we have defined all elements in (311 rigorously, hence d> is twice 
Frechet differentiable. □ 

In the case d = 1 (so A, B are real-valued) we can explicitly calculate the 
second derivative: 


(32) 


[( 5 , 5 ', G, G'), ( 5 , g', G, G')] 


= 2 ig-g'r + 


1 

2 A1/2B1/2 


(fG 2 + |(G')^- 2 GG'). 
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4-2. The Delta method and proof of Theorems 2.2 and 2.3 

The goal of this section is to derive Theorems |2.2| and |2.3| via the Delta 
method. More precisely, we will use the following result. 


Theorem 4.1 (Theorem 20.8 of [33], Delta Method). Let 0 : D C D — >■ G be 
Frechet differentiable at 9 € Ih. Let (Tn)nen o.'^d T be random variables with 
values in D and D respectively such that rn{Tn — 9) ^ T for some sequence of 
numbers r„ —)■ c». Then 


rn{f){Tr,) - f{9)) ^ Dg<P[T]. 


If additionally, Dgcj) = 0 and 4> is twice differentiable at 9, then 

rlif{T„)-m)=^lDU[T,T]. 


Remark 4.1. In |45j the result is stated in more generality, in particular for 
Hadamard differentiable functions. Since we essentially work in finite dimen¬ 
sions this difference does not matter. The statement on second derivatives is 
not included in Theorem 20.8 of |45j . However, the proof is quite the same using 
an expansion to a higher order, see Section 20.1.1 of [33] as well as Theorem 
|B.1| of Appendix 


In order to apply this result we will use known weak convergence results 
of the empirical means and covariance matrices of Gaussian distributions to 
their true means and covariance matrices. The representation in ( |^, whose 
Frechet derivative was calculated in Proposition 2.4 (see also Corollary 2.51, then 
provides the mapping from mean and covariance matrices to the 2-Wasserstein 
distance Q of Gaussian distributions. 

For this we now return to the setting of Theorem 2.2 such that P and Q in 
A4i(K‘^) are Gaussian distributions on and ~ P are i.i.d. and independent 
from Fi ^ Q i.i.d. for z S N. A central limit theorem for the respective sample 
means and covariance matrices of a sample of size n, 


(33) 

1 

An — ~ ^ ; 

Sn = V(A,-/i„)(A, 

77,-1 

2=1 

(34) 

1 "" 

^ ^ ^ 5 

n 

2=1 

1 " 

2=1 


is well known. 


fin) I 

i>n)* 


Lemma 4.2 (Section 3 in [40]). J/P = N{p,T,) then 
(35) - /i, - S) ^ g (g) G 

where g ~ iV(0, E) and G = Here, convergence in the space x 

is understood component wise and H = {Hij)ij<d is a d x d symmetric 
random matrix with independent (upper triangular) entries and 


(36) 


N{0, 1) , i < j, 

7V(0,2) ,i=j. 
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The derivation in [30] is only given for the centered case, but (as they also 
say) it can easily be obtained in the non-centered case. 


Main lines of the proof of Theorem 2.2 

First, note that due to S and S ha ving full rank, all eigenvalues are posi¬ 


tive. Consider $ as in Proposition 


2.4 


(And 


(37) 


^m5 ^m) 


. For = yjmn/{m -f n) and Tn{x i ,., 
we obtain with the help of Lemma 


4.2 


G = M and 

Xm yii ■ • ■ 1 Vm) — 


r„ {Tn{Xi, . . . , Xn,Yi, . . .Ym) - 

^ ^(1 — a)^^^ 5 , , (1 — a)^^^G, as 


00 


with g ^ N{0, E), g' ~ iV(0, S) and G = G' = all indepen¬ 

dent of each other. The symmetric Gaussian matrices H^H' have independent 
Gaussian entries in the upper triangle with mean 0 and variance 1 off-diagonal 


and variance 2 on the diagonal. We can now apply Theorem 4.1 in order to 
obtain 


(38) 


^4^ (Am ' 




U-W2(I 

((1 - a)^/^g, a}^‘^g', (1 - a)^/^G, a^/^G') 


Since {g, g',G,G') is a Gaussian vector with mean 0 and is a linear map¬ 
ping to K we know that i:i(^.^,s.H)$(((l - a^^^g.a^^^g', (1 - a)^/^G,a^/^G')) 
is a real-valued Gaussian variable with mean 0 and a certain variance vj. This 
shows ([T^. The calculation of w leading to (|T4| is provided in Section 14.3l □ 


In the one sample case (10), i.e. Theorem 2.1 the proof is entirely analogous 


but essentially simpler: We use Theorem 4T and Lemma [4.2| as before in order 
to obtain 

(39) v^($(An, Sm S) - W|(P, Q)) ^ [(g, G)] 


where the derivative is specified in Corollary |2.5 


Again, the limit is mean 0 
Gaussian and the calculation of the variance in (11) is given at the end of Section 

[431 


Remark 4.2. A final remark is to say something about the case when E or S 
do not have full rank in Theorem |2.2| Simulations show that still a very similar 
result should hold. However, our technique (delta method, i.e. differentiation) 
will not work, as can already be seen in the case d = 1. Loosely speaking the 
derivative of the variance part in ([^ where A > 0 is an eigenvalue is « 

(not being well-defined for A = 0) which gets multiplied by the direction « A 
(see (381) yielding « A^/^ in the end. 


Proof of Theorem 12.31 


Let 4> as in Proposition 2.4 Note that 4)((/r,/i, A, A)) = 0 and the proposition 
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easily implies that D{fj.^^^A,A)^ = 0- Additionally, Proposition 2.6 says that the 
function $ is twice Frechet differentiable at the point (/i, /i, A, A) and thus we 
can apply the second part of Theorem |4.1[ This allows to deduce that 


(40) 


n(cI>(AW,Ai"\sW,Ei2))_o) 

^ [(5, 9. G, G"), [g, g\ G, G')] , 


where g ~ Ai(0,S),G = g' ~ N{0,T,),G' = are all inde¬ 

pendent of each other and as in Lemma [T^ Since is a quadratic form and 
the vector {g, g', G, G') is Gaussian we obtain the desired result. □ 

4-.3. Variance formula for the limiting Gaussian distributions 

In this section we provide the details of calculating the variance of the 
derivative (1 — a)^^'^G,a^^'^G')) in ( [ 3 ^ whose 

explicit form is given i n (pTj ) of Proposition |2.4[ The variance formula for 
„ --- „ ,, then follows in a simi- 


2.5 


A>(^,E)'l>G-'=)((g, G)) of ([39|~specified in Corollary 
lar way with the the calculation in (551 below. 

The first two terms of the representation (21) involving the means g and v 
are easily calculated, namely 


(41) 


(m - - a^'^g ^ N (0, (1 -a){g- i^)*E(/x - v)) 


in - i/)aG2g' ~ N ( 0 , a{g - - i/)) . 

The explicit calculation of the remaining terms involving the covariance ma¬ 
trices S and S is more complicated. We will frequently apply Lemma C.3 In 
the following use the eigendecomposition of A a nd given in (19). Let 

G = AAI’^BAAA where H is as in Lemma 4.2 


Then since APi = Pi A = XtPi, 


I < i < d and X)i=i -F) = I tti® terms in (21) that involve G are given by 

d d 


tr(G)-^At;/"^A-V^^.GP. 


Qi 


z=i 




Z =1 


1 — 1 2—1 Z —1 2—1 m — l,Xi^Xm 


= tr(AliL)-^K,%‘5]P,iL P, 


Z=1 

d 


2=1 

d 


E 

m—ljXi^Xm 


qi 


(42) =%t{AH)-Y,kYWY.P^PP^^i^ 

Z=1 2=1 
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where in the last line we have used the notation 


d 

(43) A=-P.+ ^ Pm 

to denote the projection onto the direction corresponding to Xi as well as on all 
other directions that have eigenvalues different from Xi. For future use we note 
that Pi is again a projection due to the orthogonality of the Pi,i = 1 ,..., d, 
meaning that 

(44) P,P^=6,jP,. 


Furthermore Pi is symmetric. With this we can calculate the second moment 
and thus the variance of the centered Gaussian of (42). 

21 


E 


/ d d 


1^1 


d 


(45) 

= E ® [p\^Hp,p]AHpj\ 

(46) 

d d 

+ E E q\PiHPiqiqlPjHP^qk 

l,k—l —1 

d d 

(47) 

“ 2 E ^ E ® ® 9? Pj HPj qi ■ 




We consider these three terms separately and start with (45). Using Lemma 
C.3 and p\pj = Sij the first line (45) simplifies to 


d d d 

[plipdjYPj + iplpj)^pp^p])] [l + tr(p,p‘)] = 2'^Xl 


2 = 1 


2=1 


Also with Lemma C.3 we obtain for (46) 


E -r-r E 


k,l^l 




qfPiPjQkqlPiPjqk + {q\PAjqk + q\P^^Xi^x,qk) 


iPiqiqkPi^ij p 'y PmqiqkPm^j=m\ 

^ m — l^Xi^Xrri . 


d 




k,l—l 2=1 


G/ 2 ^ 1 / 2 ' 

'-Z . 

k,l — l 2=1 


qkp\qiqlpt 
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"I" 'y ' QlPiQkPjQl(lkPm^j=m,i=j 


d d d 

\/2 1 / 2 ' 


X! 4Pi^kp\qiqiPt +q\Piqkp]qiqkPj 

i^l j==i^Xi^Xj 




= 2E 


1/2 1/2 
f^i l^u 


kj^l 


Id d \ 

Y^{4Pi<lkf + E <l\P^(lkqlPjqk 

\i=l j = i.A./A, j 


qlAPjqiqlPjqi + qlAPiqi ^ qUAPiqr 


Similarly, for (47) we get 

d d 

-2E4'’E 

d d d 

“ 2 E 'h^^^i<l\PjPj<li + Aj-gfPj ^ Pmqi (since^ = 1 ) 

jjZ^l m=l,Aj^Am 

d 

— 2 ^ XjqiPjPjqi + XjqiPj E! Pmqi 

j,l—l m—l^Xj^Xm 

d 

1 / 2 / * 


i=l 


= + 0 ) . 


i=i 


By putting (45) to (47) back together and adding the factor (1 — a), since in 
(40) we are dealing with (1 — aY/^G instead of G we finally obtain 


(48) E 


f(l - tr(/4iJ) - E E “ aY/‘^HP,q\ 

\ /^1 / 
d d 

= 2(1 - a) tr(A^) + 2(1 - a) ^ E ^i^GkqfPiqk 

k,l—l i—1 

d 

(49) - 4(1 - a) E i^l^^qlMi ■ 


We can do a similar calculation for the variance related to G' = : 

/ d \ 

tr{GY-Y.^;^^MA^^^G'A^/\] 


E 


/=! 


a 

(50) = E (® [q\BH'qiqlBH'qk] 


kd^l 
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(51) 

(52) 


+ E 




qk 


-2E K;^^^qlBH'qkqjA^^^B^/^H'B^^^A^/^qi ). 


Here, the first term in (50) simplifies with the help of Lemma C.3 and Lemma 
C.l as well as qfqk = Ski and tT{qiqlB) — qlBqi to 


{q!B{qiqlBYqk + qjBqktT{qiqlB)) =J2q!B^qi+ QiBqkqlBqi 


k,l^l 


/ = 1 




= 2j2qlB\i = 2tTiB^). 


1=1 


Using Lemma |C . 3| and the fact that Ki and qi are th e ei genvalues and orthonor¬ 
mal eigenvectors of A'^/'^BA'^/'^ the second term in (511 reduces to 


E 

k,l=l 


- 1/2 - 1/2 


= E 

k,l=l 


- 1/2 - 1/2 


^t^l/2^1/2 (51/2^1/2^^^* ^l/2^1/2j B ^/^ A^/\k 

+g‘Hi/2Hi/2iji/2^i/2g^,tr(Bi/2^i/2^^gt^i/2^i/2)^ 

[4A^I^BA^I\k4A^I‘^BA^I\k 

tr {qlA^^^B^^^B^^^A^^\ 


= E '^I'^k {qUkqlqk + qUkqiqi) = 2^= 2tr(HH). 


kd^l 


fe=l 


Finally, with Lemmas C.3 and C.2 the third term in (52) leads to 
-2 ^ n;^f^{qlB{qkqlA^/^ByyB^/^A^/\l 

k,l^l 

+ qlBB^/^A^/^qi ii{qkq\A^/^B^/^)) 
^ {qlB^/^A-^/^A^/^BA^/\iqiB^/^A^/\i 


= -2 


fc/=i 


= _ 2 ^ lA /^ (9^51/2^-1/29,9* 51/2^1/29, + 9* 5i/2+-i/29,g*Hi/25i/2 

k,l^l 

= -^^n]/\\A-A^BAA\i. 


qk 


1=1 
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Thus, we obtain from the simplihcations of (50) to (52) and using the factor a 
from (40), 


(53) E 


ai/2 tr(G') - ^ 

. /-I / 

d 

= 2atr(i?^) + 2otr(^i3) — Ky^qjA~^^'^BA^^'^qi. 


1=1 


Finally, from (41), (48) and (53) (now replacing A and i? by E and S) as well 
as the independence of g,g',G,G' we obtain that the variance in (38) of the 
random variable T>(^,i/,e,h)‘J’((5 , G, GO) is given by 


{fi — I'Yiil — a)E + a^){fi — ly) + 2(1 — a) tr(S^) 

d d d 

+ 2(1 - a) ^ X! ^i^i^kqfPiqk - 4(1 - a) X! 

k,l^l 1^1 

d 

+ 2atr(S^) + 2atr(E5) — 4a 

1=1 


If all eigenvalues are distinct we have that Pi = I ,i = ■ - d and therefore using 

qlqk = du it also follows that 

d d d d 

XI X <l\P^qkq\P^qk = X X ddP^qi 

k,l—l i—1 1 — 1 i—1 

d d d d d 

= X X diPzPlqi = X '^I'^PidGiPi = = tr(E5). 

1 = 1 i=l 1=1 i=l 1=1 

Thus, in this case the expression for the variance reduces to 

(/r — i^)*((l ~ + a5)(/r — v) + 2tr((l — a)E^ + aS^) + 2tr(E5) 

- 4^] Ky"g*((l - a)E + aE-G2sEG2)q,. 

1=1 


We have chosen to also use this representation for the general case together with 


the fact that by (43), 


(54) h = I- Y. 


j=l 

j^i^\i=\j 


This yields (14) of Theorem 2.2 


To obtain (11) of Theorem 2.1 we need to be careful. Recall that in Corollary 


2.5 the derivative is given with the terms of (/r, A) and (^, B) being reversed. So 
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we need to follow the previous calculation for {g,G) = 0 and reverse the roles 
of (/J,, S) and S) finally, i.e. set A = S and B = T,. So we only obtain the 
second term in (41) and the terms in (53) for a = 1. 


(55) = (i/-/i)‘S(i/-/r) + 2tr(E2) + 2tr(SE) . 


1^1 


Here, {{Ki,ri) : Z = 1,... ,d} is the eigendecomposition of 


A. Functional derivatives: A reminder 

We start by collecting some basic facts on Frechet differentiability in an 
abstract setting. Let D and G be normed linear spaces, D C D open and 
: D —)■ G. The function (p is Frechet differentiable at 0 € D if there exists a 
continuous, linear map Dgp : D — >■ G such that as ||/i||d —t 0, 

1 ii(<^(0 +h)- m) - Demik ^ 0 . 

||h||D 

This concept also extends to higher order derivatives. E.g. for the second 
derivative in the setting above, the mapping Zl. : D —>■ T(D, G) is asked to 
be Frechet differentiable; here L(D,G) denotes the space of continuous linear 
mappings from D —>■ G. Since the second derivative is a bilinear form it suffices 
to dehne it on the diagonal elements. In the following we collect a number of 
calculation rules for Frechet derivatives that will be used frequently later on. 
References for the results are [46l Section 3.9], Section 3 in [ 8 ] or the classical 
sources m and [3] for a general overview. First, if (G, •) is a Banach algebra 
then a product rule holds. 

Lemma A.l (Product rule). Suppose that 0:]D)CD—^G, ^G 

are Frechet differentiable. Then their product (p ■ pj gD ^ G is also Frechet 
differentiable in D and 

DeicP ■ ^)[/i] = DgP[h] ■ xP{e) + m ■ De^P[h], h e D. 

Additionally, z/^rDcD—^-G, '0:DcD—^-G are twice Frechet-differentiable, 
then its product ci-'i/jiDcD—^-G is also twice Frechet-differentiable in D and 
for 0, h G D, 

Dg{(p ■ ip)[h, h] = Dg(p[h, h] ■ tp{9) + 2Dg(p[h] ■ Dgip[h] + (p{6) ■ Dgip[h, h]. 

We also have a chain rule. 

Lemma A.2 (Chain rule). Let (() : D C D —>■ G and : G C G —>■ E with 
(;ii(ID)) G G be Frechet differentiable at 0 G D, tpiO) G G respectively. Then ip o p 
is Frechet differentiable at 9 with derivative 

Dg{ip o p)[h] = D,j,i^g)1p[Dgp[h\], /i G D. 
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Here, the right hand side is a linear mapping from D to E. If (j) and ip are 
twice Prechet differentiable at the respective points, then ip o p is twice Frechet 
differentiable at 9 with second derivative given by the quadratic form 

Djiip o p)[h, h] = Dlf^g^tplDeplh], Dgp[h]] + D,p(^e)ip[Dlp[h, h]], hG D. 

The second part of the lemma can be deduced as in the finite-dimensional case. 
It is also an elementary observation to obtain the following result on the Frechet 
derivative of projections. 

Lemma A. 3 (Projection). Let D = Di x D 2 be a product space of two normed 
spaces and Di C Di open. Let p : Ihi ^ G be Frechet differentiable on Di 
with Frechet derivative Dg.,^p at the point 61 G Di. Then ip : Di x D 2 —t G, 
(01,02) 'P(^i) is Frechet differentiable in Di x D 2 with Frechet derivative 

D(ei,e 2 )' 4 ’{{hi: ^2)) = Dg^p{hi), hi G Di, ^2 G D2. 

Proof. We have that for (0i,02) G Di x D 2 , 


lim 


||(/ll, h2)||] 


-||V'((01,02) + {huh2)) - V'((01,02)) - Dg,P{hi)\\G 


< lim 


||?ii||di-s- 0 ||hi 


-\\p{9i + hi) - p{9i) - Dg^p{hi 


= 0 . 


□ 


B. A second order result on Prechet derivatives 

We closely follow Chapter 3 of [5^ and extend their results to a derivative 
of second order. Consider a separable Hilbert space H and the class of bounded 
linear operators C from H to FL. Its subclasses of Hermitian and compact 
Hermitian operators are denoted by C-u and C-u. 

For any T G C the spectrum (t(T) is contained in a bounded open region 
n = Vl{T) C C. Assume that Ll has a smooth boundary F = dFl with 

5t,t = dist(r, cr(T)) > 0. 

Assume additionally that C I? for an open set D C C and that p : D —>■ C is 
analytic. Define 

(B.l) Mr = max 10 ( 2 :)I < 00 , Lr = length of F < 00 . 
zer 

On the resolvent set p{T) = the resolvent given by 

R{z,T) = (M-r)-i 

is well-defined and analytic. This allows to define the operator 
(B.2) p{T) = ^ ^ (p{z)R{z) dz. 
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Define additionally for G G C: 


(B.3) 


Dr<^[G] = ^ 


L 


(t){z)R{z,T)GR{z,T) d 2 , 


(B.4) 


D^rmG] = ^^ 


L 


4>{z)R{z,T){GR{z,T)f Az, 


(B.5) 



L 


(t)iz)R{z, T){GR{z, T) f{I - GR{z, T))-^ dx. 


We will see in a moment that Dxft) and are the first and second Frechet 
derivatives of (j). The second derivative is a symmetric bilinear form. Recall 
that symmetric bilinear forms are characterized by their corresponding 

quadratic form Q{-) via the polarization identity. 

By Lemma VII.6.11 in [20] there is a constant K = |r| sup^gp \\R{z, r)|| < oo 
such that 


(B.6) ||R(z,T)|U< VzeflF 


or.T 


Next, we derive an extension of Theorem 3.1 in [M| . 

Theorem B.l. Suppose that </) : D C C —)■ K is analytic and T G C with 
u{T) C D(T) C D with 

5t,t = dist{T,a{T)) > 0. 

Then (j) maps the neighborhood 

{T = T + G : G G C, ||G|j£ < cSt^tIK for some c < 1} 

into C. This mapping is twice Frechet differentiable at T, tangentially to C, with 
bounded first derivative DT(f :£—>■£ and the second derivative is characterized 
by its diagonal form :£—)■£. More specifically, we have 


(B.7) ^{T + G) = cf{T) + Dt^[G] + D^cflG, G] + S^,t,2 [G] 


with 



Proof. We have for all G S £ with ||G||£ < c6r,TK ^ by (B.6| that 


(B.9) ||Gi?(^)|U < c. 
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This allows to calculate 
(B.IO) 

R{z,T){I - GRiz,T))-^ =R{z,T) [{R{z,T)-^ - G)R{z,T)\ ^ 

= [zI-T-G]~^ =Riz,f) 

for any z £ with T — T + G as above. As the left hand side of the previous 
equation is well-defined, we conclude that z G p{T)- Thus, cr{T) C and the 
mapping (j) applied to T = T -|- G is well defined via 


(B.ll) 

(j)(T + G) = ^ [ cl)(z)R(z,T + G)dz for G G £ with IIGIG < c5r r/i^- 
Using a Neumann series expansion we can obtain 


i?(z, T + G)= R{z, T) {I + GR{z, T) + {GR{z, T)f 

+ (Gi?(z, T)f{I - GR{z, T))-i) 

= i?(z, T) -f i?(z, T)GR{z, T) + R{z, T)GR{z, T)GR{z, T) 
+ R{z, T){GR{z, T)f{I - Gi?(z, T))-\ 


and inserting this i nto ( B.ll) a llow s to obtain ra. The bound on S^^t, 2 [G] can 
be obtained from (B.5) using (B.l) as well as (B.6) and \\{I — GR{z,T))~^\\c < 
(l-||Gi?(z,r)|Ur^ (l-c)^by ra. □ 


Now let us restrict T to the subset C-h of compact Hermitian operators. That 
allows a representation 


(B.12) T = 

i=l 

where Ai G M are eigenvalues and Pi are orthogonal projections onto one¬ 
dimensional eigenspaces (since T is compact, to each non-zero eigenvalue there is 
a finite-dimensional eigenspace that can be decomposed into orthogonal spaces). 
Then the resolvent has the following form 

C50 ^ 

(B.13) R{z, T)=Y, ^ ^ 


and for (j) : D C <y{T) —>■ K: 


(B.14) </)(£) = ^<(.(A)P,. 
2=1 
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Corollary B.2. Let the conditions of Theorem B.l he fulfilled for T G C-h with 
expansion (B.12). In this case 

(B.15) 

OrfilG] = f2 fi'{\)P,GP, + ^ i^h)zMbl p^GPk 


i^k 




and 


Dlfi[GM= Y. t{^^^^^^y^,^^^}P,GP,GP^ 

i^j^k—1 

{\j — \k)4>{\i) + (Afe — Xi)(j){Xj) + {Xi — Xj)(j){Xk) 


A^(Afe — Xj) + X^(Xi — Xk) + X^{Xj — Xi) 


(B.16) 


(B.17) 


+ E 


1 / 


{PjGPjGPi + P.GPjGPj + PjGPiGPj) 

i,j = l “ ^3 


fi'{X,) - 


fijXi) - fijXj) 

Xi Xj 


+ Y^”(.K)P^G^GP,. 


for all G G C. 


Proof. We can use the explicit form of the resolvent from (B.13) in (B.3) and 
(B.4). We restrict our attention to the second derivative since the first derivative 

(j){z) 


was already explained in [35]. Thus, 


DimG]= Y. h 


i,j,kLi "'r {z - Xi){z - X^)(z - Xk) 


dz PiGPiGPk. 


Note that for pairwise different Xi, Xj, Xk- 
1 1 1 


— Xi z — Xj z — Xk A?(Aj — Xk) + X?j{Xk — Xi) + A^(Ai — Xj) 

Xj Xk Xk A^ Xi Xj 


z — Xi z — Xj z — Xk 


Additionally, for Xi = Xj 7 ^ Xk'- 

111 1 


Z Xi Z Xj Z Xk Xi Xk 


1 


[(z-A)2 iz-X,)iZ-Xk) {z-Xk){Z-Xk)\' 
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This allows to derive 


OO « 

Dld,[G,G]= Y. / d,{z) 

JT 


i,j,k=l ‘ 


1 1 1 
z — \i z — Xj z — Xk 


dz P.GPjGPk 


i,j,k=l 

{Xj — Xk)(j>{Xi) + {Xk — Xi)(j){Xj) + {Xi — Xj)(j){Xk) 


~ ^k) + XVXk - Xi) + Xl{Xi - Xj) 


+ E !{>.= 




■ ,k—l 

OO 


+ E 


=Afc^Ai} 


^j = l 


A? Xk 


Xj Xi 




cj^'iX,) - 


<^(A,) - <^(Afc) 


Xi Xk 

(/)(Aj) - (/)(A,) 


^ ^ ^{Afc—Ai^Aj} 


Xk Xj 


OO 

+ ^ (/)"(A,)P,GP,GP,. 


4^'{Xk) — 


Xj Xi 

</>(Afc) - (l){Xj) 


Xk Xj 


P^GP.GPk 


PiGPjGPj 


PkGPjGPk 


i=l 


Now a relabeling of the indices allows to obtain the result we wanted to show. □ 


C. Some elementary facts on matrices 

The next results are elementary but as we regularly use them we state them 
here. 

Lemma C.l (Theorem 2.8 of |1H])- Let A and B be m x n and nxm complex 
matrices, respectively. Then AB and BA have the same non-zero eigenval¬ 
ues, counting multiplicity. In particular for symmetric positive definite S and 
S; eigenvalues of {A^I’^BAfil'^) and {AB) are the same, counting multiplicity. 
Moreover, 

(C.l) tr(y4P) =tT:{BA). 

A helpful tool for calculating the trace is the following lemma. 

Lemma C.2. Let {xi,... ,Xd} be any orthonormal basis ofW^ and A S 
Then 

d 

(C.2) tr(A) = 'Y. x\Axi. 

i=l 
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Proof. Let P = {x\,..., G so the first row of P is xi and so on. Then 

P^P = 1, i.e. P is unitary and thus, 

d d d 

tr(A) = Ajj = Sk=jAjk = {P P)kjAjk 

j = l j,k=l j,k=l 

d d 

= Pi,^AkjPij = y^ x^Axj. 

2,j,A;=l i—1 


□ 

Recall the matrix H of Lemma |4.2| It is the prototype of matrix which 
appears in the next lemma. 

Lemma C.3. Let PI G be symmetric with independent centered Gaussian 

entries in the upper triangular part s.t. Hu ~ fV(0, 2) for 1 < i < d and Hij ^ 
iV(0,1) for 1 < i < j < d. Let m,n G N. For C G D G and 

E G it holds that 

(C.3) E[{CHDHE)^^] = {CD*E)^^ + (CE)^^ ■ tr(D), I < i < m, 1 < j < n. 
Proof. We note that 1 < k,l,p,q < d we have 

Ei[HklHpq\ 21|^—/—p—+ ^^k—p^l—q} T ^{k—q^l—p} • 

We can use that on the matrix product 

d 

{CHDHE)^^= Y, C^kHkiDipHpgEg^ 

k,l,p,q—l 


to evaluate 

d 

E[{CHDHE)^^]=2 Y l{fc=i=p=g}C'.fcAp£^« 

kj,p,q—l 

d 

+ E ^{k—p^l—q} CikDipEqj 

k,l,p,q—l 

d 

+ E ^{k—q^l—p} CikDlpEqj 

k,l,p,q—l 

d d d d d 

= 2YC.kDkkEk,+Y E CikEij^Eij + E E CikDiiEkj 

k—1 k—1 1 — l.^l^k k—1 l—\^l^k 

= {CD^E)^^ + {CE)^yir{D). 

□ 
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